A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles
نویسندگان
چکیده
MOTIVATION The increasing use of microarray technologies is generating large amounts of data that must be processed in order to extract useful and rational fundamental patterns of gene expression. Hierarchical clustering technology is one method used to analyze gene expression data, but traditional hierarchical clustering algorithms suffer from several drawbacks (e.g. fixed topology structure; mis-clustered data which cannot be reevaluated). In this paper, we introduce a new hierarchical clustering algorithm that overcomes some of these drawbacks. RESULT We propose a new tree-structure self-organizing neural network, called dynamically growing self-organizing tree (DGSOT) algorithm for hierarchical clustering. The DGSOT constructs a hierarchy from top to bottom by division. At each hierarchical level, the DGSOT optimizes the number of clusters, from which the proper hierarchical structure of the underlying dataset can be found. In addition, we propose a new cluster validation criterion based on the geometric property of the Voronoi partition of the dataset in order to find the proper number of clusters at each hierarchical level. This criterion uses the Minimum Spanning Tree (MST) concept of graph theory and is computationally inexpensive for large datasets. A K-level up distribution (KLD) mechanism, which increases the scope of data distribution in the hierarchy construction, was used to improve the clustering accuracy. The KLD mechanism allows the data misclustered in the early stages to be reevaluated at a later stage and increases the accuracy of the final clustering result. The clustering result of the DGSOT is easily displayed as a dendrogram for visualization. Based on a yeast cell cycle microarray expression dataset, we found that our algorithm extracts gene expression patterns at different levels. Furthermore, the biological functionality enrichment in the clusters is considerably high and the hierarchical structure of the clusters is more reasonable. AVAILABILITY DGSOT is available upon request from the authors.
منابع مشابه
Hierarchical Clustering for Complex Data
In this paper we introduce a new tree-structured self-organizing neural network called a dynamical growing self-organizing tree (DGSOT). This DGSOT algorithm constructs a hierarchy from top to bottom by division. At each hierarchical level, the DGSOT optimizes the number of clusters, from which the proper hierarchical structure of the underlying data set can be found. We propose a Klevel up dis...
متن کاملGenomic and proteomic analysis with dynamically growing self organising tree (DGSOT) for measuring clinical outcomes of cancer
Genomics and proteomics microarray technologies are used for analysing molecular and cellular expressions of cancer. This creates a challenge for analysis and interpretation of the data generated as it is produced in large volumes. The current review describes a combined system for genetic, molecular interpretation and analysis of genomics and proteomics technologies that offers a wide range of...
متن کامل3. A New Hierarchical Approach for Image Clustering
The key problem in achieving efficient and user-friendly retrieval in the domain of image is the development of a search mechanism to guarantee delivery of minimal irrelevant information (high precision) while ensuring that relevant information is not overlooked (high recall). The unstructured format of images tends to resist the deployment of standard search mechanism and classification techni...
متن کاملBinary tree-structured vector quantization approach to clustering and visualizing microarray data
MOTIVATION With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector mac...
متن کاملHierarchical Clustering of Gene Expression Data
Rapid development of biological technologies generates a hug amount of data, which provides a processing and global view of the gene expression levels across different conditions and over multiple stages. Analyzation and interpretation of these massive data is a challenging task. One of the most important steps is to extract useful and rational fundamental patterns of gene expression inherent i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 20 16 شماره
صفحات -
تاریخ انتشار 2004